1. Field of the Invention
The present invention relates to the fields of publishing, document editing and manipulation, and displaying documents and images. More particularly, the present invention relates to paginating, extracting, synchronizing, and displaying, a document in electronic form.
2. Art Background
As the development of multimedia computer display systems continues to advance, more computing power and features are available to computer users. For example, information which has historically been limited to published paper documents is now being made available through on-line computing services from publishers and information vendors. As an increasing market share of the data and computing capacity is provided through low cost high performance personal computers, some of the on-line information is also being made available in compact disks (CD) and magnetic media formats. Compact disk and magnetic media technology offer cost effective mass storage of documents, images and other data, in a format readily accessible for use with personal computers in a home or office environment. The combination of personal computers, compact disk technology and multimedia interactive graphic user interfaces, permits the access and display of textual and graphic information by personal computer (PC) users in a manner not previously known in the industry. The type of information potentially available to a PC user includes professional and technical publications, newspapers, magazines, and other scientific and literary data and images.
However, much of the information which is published through, for example, government sources, newspapers and magazines is not in machine readable form, but rather is printed on paper. Because of the amount of work and effort required to convert the printed information into a machine readable form, only a small portion of the total published information is currently available for use by PC users using magnetic disks, CDs and the like. In addition, the information which is in machine readable form is typically available either as an image of the original document or as a stream of text data. An image of a document has the advantage of presenting the information in its original format as published, including non-text material, such as drawings, equations, symbols, diagrams, etc. The viewer is familiar with the format, and the information is easily recognized and understood. However, since a document image is often stored as a bitmap, the content of the document cannot be easily searched or manipulated. Alternatively, a text data stream format has the advantage of presenting the information in a manipulable and searchable format. Unfortunately, in many cases, the format of presentation is not the format in which the information was originally published in print. Thus, the users are often unfamiliar with the format, inhibiting easy navigation of the document making information difficult to find and use.
One example of the problem of reproducing originally published documents stored in machine readable form, is the storage and display of United States patent documents by the United States Government. The United States Patent Office (herein referred to as the "PTO") provides magnetic tapes of issued U.S. patents and other documents, in the form of a scanned in image, and as a separate stream of text data. The magnetic tape storing the text data does not include graphical illustrations such as drawings, charts, textual tables, or much in the way of formatting data. Thus, the reproduction of a United States patent from PTO Text Files stored on magnetic tape does not result in the display of a U.S. patent as originally published by the U.S. Government. An example of a well known system for displaying text files provided by the PTO is that of the LexPat.RTM. system provided by Mead Data offered in conjunction with the Lexis.RTM. display system. Using the LexPat.RTM. system, the display of a U.S. patent on a terminal, such as a PC, results in a display of text only, and does not include drawings, charts, graphs, or original formatting information. The text of a selected patent appears in ASCII format, but does not appear as the original patent issued by the PTO, and may not be referenced by the original column and line numbers from the published patent. Other systems display text files of periodicals such as the Wall Street Journal or legal documents such as contracts. However, the text files do not appear as the original documents.
The U.S. Patent Office also provides magnetic tapes with image files comprising a scanned in image of the original U.S. patent issued by the PTO and published by the U.S. Government. The image files provided on magnetic tape by the PTO simply represent a bitmap image of the original published patent. As a scanned in image, the entire patent is provided including drawings, charts, graphs, text and the original format, since it represents a simple bitmap of the scanned original document. However, a scanned document may not be easily searched, edited, navigated or otherwise manipulated as can a text file.
As will be described, the present invention provides a method and apparatus for extracting, synchronizing, displaying, navigating and manipulating text and image documents simultaneously in electronic form. The present invention is described with particular reference for use with U.S. patent documents, and includes the process of extracting patent text and image data from magnetic tapes provided by the PTO, synchronizing the text and image data for recovering the original format (i.e., columns and lines) of the original published patent, and displaying the formatted text along with images using a unique graphical user interface (GUI) workbench. Although the present invention is described with reference to patent documents, it will be appreciated that the invention has application to a variety of different types of documents and applications.
The present invention's graphical user interface permits a user to selectively view ASCII text documents as well as bitmapped scanned images simultaneously on a display. When used in conjunction with U.S. patent documents, the graphic user interface of the present invention allows a user, such as a patent attorney, to display and manipulate both textual as well as graphic portions of patents. The text of a patent may be viewed on the display as it was originally published by the PTO, including column and line numbers. Simultaneously, the user may view the figures of a patent in the form of an image comprising a bitmap. Various functions are provided by the present invention for viewing, manipulating and displaying the patent documents. In order to assist the reader in understanding of graphic user interface (GUI) technology, it is suggested that certain references be considered for background. Many user interfaces utilize metaphors in the design of the interface as a way of maximizing human familiarity, and conveying information between the user and the computer. As for the use of familiar metaphors, such as desktops, notebooks, spread sheets, and the like, the interface takes advantage of existing human mental structures to permit a user to draw upon the metaphor analogy to understand the requirements of the particular computer system. (See for example, Patrick Chan "Learning Considerations in User Interface Design: The Room Model", Report CS-84-16, University of Waterloo, Computer Science Department, Ontario, Canada, July, 1984 and the references cited therein.) In addition, the reader is referred to the following references which describe various aspects, methods and apparatus associated with prior art graphic user interface design: U.S. Pat. No. Re. 32,632; U.S. Pat. No. 4,931,783; U.S. Pat. No. 5,072,412; and U.S. Pat. No. 5,148,154, and the references cited therein.
As will be described more fully below, the present invention's graphic user interface is based on a desktop "windows" metaphor, and provides the user with the ability to simultaneously display text and image documents in both a synchronized and unsynchronized fashion, as will be more fully described herein.